Community Resource Index
Objectives: Identify the census tracts with the most in need of resources based on educational data Scope: 4 major counties in DFW area (Dallas, Collin, Denton, and Tarrant)
Click here to see the index on map
“Morally and economically, allowing so many of our children to grow up so far from opportunity threatens our future. If we are willing to embrace the challenge of working collectively and strategically, Dallas can cut childhood poverty in half within a single generation.” —Mayor Mike Rawlings Child poverty is a serious problem, children living in poverty are more likely to experience repeated trauma than other peers, then these traumas are likely to lead to lower opportunity. This cycle is hard to break if we don’t get the resources to the right families. The goal of this project is to build an index to improve resource allocation, and prioritize and deploy the funding to the right agencies or schools. The index is comprised of five categories, community, economics, education, health, and family. In this project, we focused on education.
In education sub-index, we decide to use 11 features and 4 target variables as our indicators, where features include early education enrollment, school poverty, student-teacher ratio, free lunch, reduced lunch, title I school, high-quality ECE centers, math proficiency, reading proficiency, high school graduation rate, second education, and target variables are household income, mental health, physical health, and poverty probability index.
First, we apply KNN imputer to handle missing values. Instead of inserting missing values all at once, we split the data by county, then imputed the missing values separately, because we were told by the nonprofit organizations we cooperated with that the situation for each county was very different. Then we scaled data (MinMaxScaler) for all variables to ensure they are comparable. On more thing, we reversed three target variables (mental health, physical health, and poverty probability index) by multiplying with -1 to ensure the higher score means the better in the target variables. After completing data preprocessing, we decided to use simple linear regression model to avoid the multicollinearity problem between independent variables, we obtained coefficients by regressing each target variable with independent variables.
The following table shows the results and weights for each features:
Features | Household Income | Mental Health - Reversed | Physical Health - Reversed | Poverty Probability Index - Reversed | Average | Average-Scaled | Wj | Weights |
---|---|---|---|---|---|---|---|---|
Early Education Enrollment | 0.17 | 0.27 | 0.25 | 0.14 | 0.21 | 6.44 | 3.72 | 0.34 |
School Poverty | 0.26 | 0.39 | 0.35 | 0.23 | 0.31 | 9.59 | 5.29 | 0.48 |
Student-Teacher Ratio | -0.12 | -0.35 | -0.32 | -0.15 | -0.23 | -7.26 | -3.13 | -0.29 |
Free Lunch | -0.27 | -0.41 | -0.44 | -0.29 | -0.35 | -10.94 | -4.97 | -0.45 |
Reduced Lunch | -0.11 | -0.07 | -0.02 | 0.05 | -0.04 | -1.20 | -0.10 | -0.01 |
Title I School | -0.16 | -0.21 | -0.22 | -0.15 | -0.19 | -5.74 | -2.37 | -0.22 |
High-Quality ECE Centers | 0.07 | -0.19 | -0.08 | -0.40 | -0.15 | -4.71 | -1.86 | -0.17 |
Math Proficiency | 0.37 | 0.51 | 0.51 | 0.29 | 0.42 | 13.09 | 7.05 | 0.64 |
Reading Proficiency | 0.37 | 0.53 | 0.54 | 0.33 | 0.44 | 13.76 | 7.38 | 0.67 |
High School Graduation Rate | -0.45 | -0.74 | -0.73 | -0.41 | -0.58 | -18.01 | -8.51 | -0.77 |
Second Education Rate | 0.39 | 0.66 | 0.62 | 0.39 | 0.52 | 15.99 | 8.49 | 0.77 |
Finally, we sum up values as an index for all features after applying weights to each variable for every census tract. Then, rescaled the index to the range from 0 to 100. A lower score means the census tract is having a not so good situation with child poverty.